Language Model Adaptation Using Machine-Translated Text for Resource-Deficient Languages

نویسندگان

  • Arnar Thor Jensson
  • Koji Iwano
  • Sadaoki Furui
چکیده

Text corpus size is an important issue when building a language model (LM). This is a particularly important issue for languages where little data is available. This paper introduces an LM adaptation technique to improve an LM built using a small amount of task-dependent text with the help of a machine-translated text corpus. Icelandic speech recognition experiments were performed using data, machine translated (MT) from English to Icelandic on a word-by-word and sentence-by-sentence basis. LM interpolation using the baseline LM and an LM built from either word-by-word or sentence-by-sentence translated text reduced the word error rate significantly when manually obtained utterances used as a baseline were very sparse.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Language model adaptation for resource deficient languages using translated data

Text corpus size is an important issue when building a language model (LM). This is a particularly important issue for languages where little data is available. This paper introduces a technique to improve a LM built using a small amount of task dependent text with the help of a machine-translated text corpus. Perplexity experiments were performed using data, machine translated (MT) from Englis...

متن کامل

Development of a speech recognition system for Icelandic using machine translated text

Text corpus size is an important issue when building a language model (LM). This is a particularly important issue for languages where little data is available. This paper introduces an LM adaptation technique to improve an LM built using a small amount of task dependent text with the help of a machine-translated text corpus. Icelandic word error rate experiments were performed using data, mach...

متن کامل

Development of a WFST based Speech Recognition System for a Resource Deficient Language Using Machine Translation

Text corpus size is an important issue when building a language model (LM) in particular where insufficient training and evaluation data are available. In this paper we continue our work on creating a speech recognition system with a LM that is trained on a small amount of text in the target language. In order to get better performance we use a large amount of foreign text and a dictionary mapp...

متن کامل

Cross-Lingual Lexical Triggers in Statistical Language Modeling

We propose new methods to take advantage of text in resource-rich languages to sharpen statistical language models in resource-deficient languages. We achieve this through an extension of the method of lexical triggers to the cross-language problem, and by developing a likelihoodbased adaptation scheme for combining a trigger model with an -gram model. We describe the application of such langua...

متن کامل

The Correlation of Machine Translation Evaluation Metrics with Human Judgement on Persian Language

Machine Translation Evaluation Metrics (MTEMs) are the central core of Machine Translation (MT) engines as they are developed based on frequent evaluation. Although MTEMs are widespread today, their validity and quality for many languages is still under question. The aim of this research study was to examine the validity and assess the quality of MTEMs from Lexical Similarity set on machine tra...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • EURASIP J. Audio, Speech and Music Processing

دوره 2008  شماره 

صفحات  -

تاریخ انتشار 2008